RL's Edge: MIT Shows Reinforcement Learning Cuts Catastrophic Forgetting vs Supervised Fine-Tuning
'MIT shows that on-policy reinforcement learning preserves prior capabilities better than supervised fine-tuning by minimizing forward KL divergence between the base and fine-tuned models.'